Sequence Identification Utilizing Curated Custom Databases

نویسندگان

  • Dan Kuyper
  • Hesham Ali
  • Steven Hinrichs
چکیده

This paper introduces a package, BioDatabase, a web-based application allowing for genetic sequence identification via the creation of custom databases. These custom databases contain a relatively small set of genetic data, specific to a species or a subset for a species. Each custom database sequence contains genetic regions with a highly conserved start and end sequence, along with other specific characteristics defined by an administrator. The package addresses issues regarding GenBank, whose information is valuable, yet difficult to properly utilize due to its size and quality. GenBank’s size also prohibits optimal alignment algorithms from being used when identifying sequences against it. BioDatabase integrates this information, allowing optimal alignment algorithms to be run against local data obtained from GenBank or input manually by a researcher. This maximizes application performance with minimal impact on the speed results are returned to researchers. The BioDatabase package allows researchers to formulate sequence identification concepts and test their ideas against a validated database. Our concluding case study highlights this capability. The case study involves Mycobacterium identification using the 16S and ITS genetic regions. In the case study, researchers were able to correctly identify 72 of 78 Mycobacterium isolates through new sequence identification techniques using the BioDatabase package. These results proved to provide better identification of Mycobacterium in most cases than existing techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Use of Genus-Specific Amplicon Pyrosequencing to Assess Phytophthora Species Diversity Using eDNA from Soil and Water in Northern Spain

Phytophthora is one of the most important and aggressive plant pathogenic genera in agriculture and forestry. Early detection and identification of its pathways of infection and spread are of high importance to minimize the threat they pose to natural ecosystems. eDNA was extracted from soil and water from forests and plantations in the north of Spain. Phytophthora-specific primers were adapted...

متن کامل

ProPepper: a curated database for identification and analysis of peptide and immune-responsive epitope composition of cereal grain protein families

ProPepper is a database that contains prolamin proteins identified from true grasses (Poaceae), their peptides obtained with single- and multi-enzyme in silico digestions as well as linear T- and B-cell-specific epitopes that are responsible for wheat-related food disorders. The integrated database and analysis platform contains datasets that are collected from multiple public databases (Unipro...

متن کامل

SMART: identification and annotation of domains from signalling and extracellular protein sequences

SMART is a simple modular architecture research tool and database that provides domain identification and annotation on the WWW (http://coot.embl-heidelberg.de/SMART). The tool compares query sequences with its databases of domain sequences and multiple alignments whilst concurrently identifying compositionally biased regions such as signal peptide, transmembrane and coiled coil segments. Annot...

متن کامل

BIR Pipeline for Preparation of Phylogenomic Data

SUMMARY We present a pipeline named BIR (Blast, Identify and Realign) developed for phylogenomic analyses. BIR is intended for the identification of gene sequences applicable for phylogenomic inference. The pipeline allows users to apply their own manually curated sequence alignments (seed) in search for homologous genes in sequence databases and available genomes. BIR automatically adds the id...

متن کامل

UniProt: the Universal Protein knowledgebase

To provide the scientific community with a single, centralized, authoritative resource for protein sequences and functional information, the Swiss-Prot, TrEMBL and PIR protein database activities have united to form the Universal Protein Knowledgebase (UniProt) consortium. Our mission is to provide a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003